Using a Solver Over the String Pattern Domain to Analyze Gene Promoter Sequences
نویسندگان
چکیده
This chapter illustrates how inductive querying techniques can be used to support knowledge discovery from genomic data. More precisely, it presents a data mining scenario to discover putative transcription factor binding sites in gene promoter sequences. We do not provide technical details about the used constraintbased data mining algorithms that have been previously described. Our contribution is to provide an abstract description of the scenario, its concrete instantiation and also a typical execution on real data. Our main extraction algorithm is a complete solver dedicated to the string pattern domain: it computes string patterns that satisfy a given conjunction of primitive constraints. We also discuss the processing steps necessary to turn it into a useful tool. In particular, we introduce a parameter tuning strategy, an appropriate measure to rank the patterns, and the post-processing approaches that can be and have been applied. Christophe Rigotti Laboratoire LIRIS CNRS UMR 5205, INSA-Lyon, 69621 Villeurbanne, France e-mail: [email protected] Ieva Mitašiūnaitė Faculty of Mathematics and Informatics, Vilnius University, Lithuania e-mail: [email protected] Laurène Meyniel Laboratoire LIRIS CNRS UMR 5205, INSA-Lyon, 69621 Villeurbanne, France e-mail: [email protected] Jean-François Boulicaut Laboratoire LIRIS CNRS UMR 5205, INSA-Lyon, 69621 Villeurbanne, France e-mail: [email protected] Olivier Gandrillon Centre de Génétique Moléculaire et Cellulaire CNRS UMR 5534, Université Claude Bernard Lyon I, 69622 Villeurbanne, France e-mail: [email protected]
منابع مشابه
Comparison of Promoter Sequences of Flowering Control Genes, FT1 and Three Versions of VIN3, in Susceptible and Resistant Sugar Beet Genotypes to Bolting
Autumn sowing of sugar beet is a suitable way in sustainable agriculture. Bolting is an undesirable phenomenon which reduces sugar beet yield and it is the most important limiting factor in autumn sowing of sugar beet. Identification and comparison of the sequence of flowering genes in various genotypes can help to understand the molecular mechanisms controlling bolting. In the previous studies...
متن کاملMolecular and Bioinformatics Analysis of Allelic Diversity in IGFBP2 Gene Promoter in Indigenous Makuee and Lori-Bakhtiari Sheep Breeds
The aim of this study was to perform molecular and bioinformatics analysis of IGFBP2 gene promoter in association with some economic traits in indigenous Makuee (MS) and Lori-Bakhtiari (LB) breeds. DNA was extracted from blood samples of 120 MS and 200 LB and a 297 bp fragment from the upstream sequences of studied gene was amplified and genotyped by single-strand conformational polymo...
متن کاملMitašiūnaitė Mining String Data under Similarity and Soft - Frequency Constraints : Application to Promoter Sequence Analysis
An inductive database is a database that contains not only data but also patterns. Inductive databases are designed to support the KDD process. Recent advances in inductive databases research have given rise to a generic solvers capable of solving inductive queries that are arbitrary Boolean combinations of anti-monotonic and monotonic constraints. They are designed to mine different types of p...
متن کاملNo d ’ ordre : 2009 - ISAL - 0036 Année 2009
An inductive database is a database that contains not only data but also patterns. Inductive databases are designed to support the KDD process. Recent advances in inductive databases research have given rise to a generic solvers capable of solving inductive queries that are arbitrary Boolean combinations of anti-monotonic and monotonic constraints. They are designed to mine different types of p...
متن کاملIn silico screening of G-Quadruplex Structures in Wilms tumor 1 Gene Promoter
Introduction: X-ray diffraction studies have revealed that guanines in a DNA stands may be arranged in quartet and form a structure called G-quadruplexs. Bioinformatics studies suggested the formation of G-quadruplex structure in human crucial genes, including Wilms tumor 1 (WT1). The aim of this study was to in silico analysis of the guanine-rich sequence in the promoter region of the WT1 gene...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010